Characteristics of SARS-CoV-2 spread in lithuania

Most abundant lineages

Evolutionary stability comparision


Pairwise stability comparison results for Lithuanian lineages

For each lineage a maximum distance (in days) between any of two sequences in cluster were calculated (D) for each cluster and compared with a reference lineage BA.2. The y-axis denotes the difference between mean D values comparing a lineage and BA.2 expressed as relative change. The positives values indicates greater stability and negative values denote lower stability using BA.2 lineage as a reference. The upper panel denotes significances of comparisons using different statistical tests, the red color denotes cases were the difference between a lineage and BA.2 lineage was found to be of low significance (p > 0.05)


Stability of lineages calculated using two methods

The upper panel denotes results of pairwise comparisons when all lineages were compared to the reference lineage “BA.2”. The Y axis denotes the analyzing all together or each lineage a maximum distance (in days) between any of two sequences in cluster were calculated (D) for each cluster and compared with a reference lineage BA.2. The y-axis denotes the difference between mean D values comparing a lineage and BA.2 expressed as relative change. The positives values indicates greater stability and negative values denote lower stability – the values are indicated at the bottom of the plot. The lower panel denotes results of one way ANOVA based on zero-inflated Negative Binomial distribution. The y-axis denotes estimated marginal (EM) mean of D and the error bar denotes the the 95% confidence interval of the EM mean. The values of EM mean for each lineage are indicated at the bottom of the plot.


Stability changes over time for wide spread lineages

The left panel (a) shows time trend of stability calculated as percentage difference compared to BA.2 lineage. The size of the dots denote the size of the cluster used for analysis, labels by the dots indicated the lineage and the color indicates results of Wilcoxon pairwise test with alternative hypothesis “not equal” comparing against BA.2. Also the corresponding Pearson correlation is indicated along with its significance estimate. The right panel (b) shows time trend of stability represented by the lower 95% confidence limit of EM mean (asymnp.LCL). The size of the dots denote the size of the cluster used for analysis, labels by the dots indicated the lineage. Also the results of Pearson correlation for the time trend of EM mean and its lower 95% confidence limit are indicated at the top and bottom.


Enrichment of positions with incresed sequence variability

Lineage-wise contacts enrichment.

In the panels a and b x-axis denotes a lineage which is indicated in the panel b. The lineages are ordered from the oldest to the newest ones. On the y-axis of panel a, the Fde (fraction of clusters of antibodies) that reveals significant differences is shown, comparing entropy values between residues in contact with antibodies and those not in contact, with the significance determined by the Wilcoxon test. The y-axis of panel b shows absolute changes in entropy values comparing positions for contacting and non contacting residues with the set of contacting residues was composed based on all antibody clusters. The y-axis in panel c denotes the Fde and the x-axis denotes month at which a lineage reached its maximum abundance.


Enriched positions

Positions with increased number of mutations at antibodies clusters contacting residues derived from lineage-wise analysis.P

The left panel denotes the positions for each analyzed lineages. The lineages are indicated by the facet labels (y-axis). The sequence positions are indicated in the x-axis. The size of dots denotes number of structural antibodies clusters that contacts with a position. Color denotes positional sequence cluster derived from hierarchical clustering with complete linkage, flattening cut-off based on 100 sequence positions. The right panel denotes changes over time in number of contacting structural antibodies clusters for each positional cluster. The facet labels and colors denote the positional clusters, the colors in both right and left panels matches the same positional cluster. Each dot in the right panel denote a lineage, x-axis denotes the month when the abundance of a lineage peaked. On the top of the results of corresponding Spearman correlation test are indicated.


Positions with increased number of antibodies clusters contacting comparing wordwide and regional analysis.

The left panel (a) denotes the positions for each analyzed lineages. The lineages are indicated by the facet right labels, the scope of the analysis is indicated by the left labels. “all” denotes analysis based on all GISAID data, “lt” denotes analysis using subset of Lithuanian sequences. The sequence positions are indicated in the x-axis. The size of dots denotes number of structural antibodies clusters that contacts with a position. Color denotes positional sequence cluster derived from hierarchical clustering with complete linkage, flattening cut-off based on 100 sequence positions. The right panel denotes match between positions that were detected as having increased number of antibodies contacting residues based on worldwide and Lithuanian lineages. The positions were considered to be matching if distance between them was less than 10 sequence positions apart.

Impact per antobody cluster clusters

Number of mutations at position of contacting antibodies residues.

X-axis matches a month and Y-axis matches the number of mutations at the positions of the contacting antibodies residues. Panels a and b denotes number of mutations considering all strains. Upper panels (a,c) denote worldwide data, lower panels (b,c) denote a subset matching to Lithuania. Panels c and d denotes only data matching the most abundant month-wise lineages. The pink transparent line in panels a and b denotes number of GISAID deposited sequence originating either from all world or Lithuania respectively.


Changes over time of contacts per each antibody cluster

The left panel (a) denotes changes over time for each analyzed cluster individually. The top facet denotes the set of positions matching unipn of all clusters. The cluster-wise facets from top to bottom are ordered based on tau values of Mann-Kendall test for trend test of time derived from worldwide GISAID sequences. If the corresponding p > 0.05, tau values were not indicated. The tau values “NA” denotes cases with insignificant time related trends. The right panel (b) denotes regions of sequence positions along protein S that match the epitopic region of the structural antibodies clusters. These regions result from hierarchical clustering with complete linkage, flattening cut-off based on 100 sequence positions.


Changes over time of contacts per each antibody cluster for regional sequences.

The plots denotes changes over time for each analyzed cluster individually. The top facet denotes the set of positions matching union of all clusters. The cluster-wise facets from top to bottom are ordered based on tau values of Mann-Kendall test for trend test of time derived from worldwide GISAID sequences. If the corresponding p > 0.05, tau values were not indicated. The tau values “NA” denotes cases with insignificant time related trends.


Predicted binding affinity of sampled protein S variants to antibodies

It would require enormous computational power to predict changes in binding to antibodies due to mutations for all observed combinations of mutations. The set for testing was composed as follows:

  1. For each month (based on provided metadata) on the sequences that is assigned (by pangolin) to the lineage which was detected as being the most abundant lineage for the month.
  2. A median number of mutations at the protein S residues that contacts antibodies clusters are calculated. One sequence matching the median mutation number is sampled (one sequence per each moth).
  3. Three highest number of mutations at the contacting residues are sampled for each month.
  4. For each for the sample sequences three PDB structures are chosen for energy evaluation. Out of all possible antibody – protein S structures for analysis the structured with maximum number of contacts at the mutated positions are chosen. The collected set of PDBs are then analyzed against all the sampled sequences.

Below the box plots of the the evaluations of ΔΔG biding energy terms are given: • our developed evaluation method (denoted as “ddG”); • changes in binding energy calculated by EVOEF1 (denoted as “dEVOEF1”); • changes in binding energy calculated using PRODIGY energy function (denoted as “dPRODIGY”).

Notes on mutant structures generation: only impact of substitution and deletion are evaluated. Structures containing only substitutions are generated based on EvoEF1 program, modeling of deletion/substitution mix are generated by ProMod3. In both cases rotamers are optimized using FASPR.

Results of our developed evaluation method is denoted as “ddG”. “dEVOEF1” denotes changes in binding energy calculated by EvoEF1 program. Changes in binding energy calculated based on PRODIGY energy function is denoted as “dPRODIGY”.

The corresponding data in tabular format is given in the table below. If a sequence is sampled from the sequences having the highest number of mutations it is denoted as “increased” in the column “contacts_class” (sampling is described above).

Predicted binding affinity of sampled protein S variants to ACE2

It would require enormous computational power to predict changes in binding to ACE2 receptor due to mutations for all observed combinations of mutations. The set for testing was composed as follows:

  1. For each month (based on provided metadata) on the sequences that is assigned (by pangolin) to the lineage which was detected as being the most abundant lineage for the month.
  2. A median number of mutations at the protein S residues that contacts antibodies clusters are calculated. One sequence matching the median mutation number is sampled (one sequence per each moth).
  3. Three highest number of mutations at the contacting residues are sampled for each month.
  4. For each for the sample sequences three PDB structures are chosen for energy evaluation. Only complexes containing wild type ACE2 structures were considered. Out of the ACE2 – protein S structures for analysis the structured with maximum number of contacts at the mutated positions are chosen. The collected set of PDBs are then analyzed against all the sampled sequences.

Below the box plots of the the evaluations of ΔΔG biding energy terms are given: • our developed evaluation method (denoted as “ddG”); • changes in binding energy calculated by EVOEF1 (denoted as “dEVOEF1”); • changes in binding energy calculated using PRODIGY energy function (denoted as “dPRODIGY”).

Notes on mutant structures generation: only impact of substitution and deletion are evaluated. Structures containing only substitutions are generated based on EvoEF1 program, modeling of deletion/substitution mix are generated by ProMod3. In both cases rotamers are optimized using FASPR.

Results of our developed evaluation method is denoted as “ddG”. “dEVOEF1” denotes changes in binding energy calculated by EvoEF1 program. Changes in binding energy calculated based on PRODIGY energy function is denoted as “dPRODIGY”.

The corresponding data in tabular format is given in the table below. If a sequence is sampled from the sequences having the highest number of mutations it is denoted as “increased” in the column “contacts_class” (sampling is described above).